AITopics | strict saddle

Collaborating Authors

strict saddle

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Geometry of Critical Sets and Existence of Saddle Branches for Two-layer Neural Networks

Zhang, Leyang, Zhang, Yaoyu, Luo, Tao

arXiv.org Artificial IntelligenceMay-25-2024

This paper presents a comprehensive analysis of critical point sets in two-layer neural networks. To study such complex entities, we introduce the critical embedding operator and critical reduction operator as our tools. Given a critical point, we use these operators to uncover the whole underlying critical set representing the same output function, which exhibits a hierarchical structure. Furthermore, we prove existence of saddle branches for any critical set whose output function can be represented by a narrower network. Our results provide a solid foundation to the further study of optimization and training behavior of neural networks.

critical point, neural network, operator, (14 more...)

arXiv.org Artificial Intelligence

2405.17501

Country:

Asia > China > Shanghai > Shanghai (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Switzerland (0.04)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Stochastic Subgradient Descent Escapes Active Strict Saddles

Bianchi, Pascal, Hachem, Walid, Schechtman, Sholom

arXiv.org Machine LearningAug-4-2021

In non-smooth stochastic optimization, we establish the non-convergence of the stochastic subgradient descent (SGD) to the critical points recently called active strict saddles by Davis and Drusvyatskiy. Such points lie on a manifold $M$ where the function $f$ has a direction of second-order negative curvature. Off this manifold, the norm of the Clarke subdifferential of $f$ is lower-bounded. We require two conditions on $f$. The first assumption is a Verdier stratification condition, which is a refinement of the popular Whitney stratification. It allows us to establish a reinforced version of the projection formula of Bolte \emph{et.al.} for Whitney stratifiable functions, and which is of independent interest. The second assumption, termed the angle condition, allows to control the distance of the iterates to $M$. When $f$ is weakly convex, our assumptions are generic. Consequently, generically in the class of definable weakly convex functions, the SGD converges to a local minimizer.

active strict saddle, stratification, strict saddle, (14 more...)

arXiv.org Machine Learning

2108.02072

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Illinois (0.04)
Europe > France (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback

The Landscape of Matrix Factorization Revisited

Valavi, Hossein, Liu, Sulin, Ramadge, Peter J.

arXiv.org Machine LearningFeb-26-2020

We revisit the landscape of the simple matrix factorization problem. For low-rank matrix factorization, prior work has shown that there exist infinitely many critical points all of which are either global minima or strict saddles. At a strict saddle the minimum eigenvalue of the Hessian is negative. Of interest is whether this minimum eigenvalue is uniformly bounded below zero over all strict saddles. To answer this we consider orbits of critical points under the general linear group. For each orbit we identify a representative point, called a canonical point. If a canonical point is a strict saddle, so is every point on its orbit. We derive an expression for the minimum eigenvalue of the Hessian at each canonical strict saddle and use this to show that the minimum eigenvalue of the Hessian over the set of strict saddles is not uniformly bounded below zero. We also show that a known invariance property of gradient flow ensures the solution of gradient flow only encounters critical points on an invariant manifold $\mathcal{M}_C$ determined by the initial condition. We show that, in contrast to the general situation, the minimum eigenvalue of strict saddles in $\mathcal{M}_{0}$ is uniformly bounded below zero. We obtain an expression for this bound in terms of the singular values of the matrix being factorized. This bound depends on the size of the nonzero singular values and on the separation between distinct nonzero singular values of the matrix.

critical point, eigenvalue, eigenvector, (15 more...)

arXiv.org Machine Learning

2002.12795

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Extending the step-size restriction for gradient descent to avoid strict saddle points

Schaeffer, Hayden, McCalla, Scott G.

arXiv.org Machine LearningAug-5-2019

We provide larger step-size restrictions for which gradient descent based algorithms (almost surely) avoid strict saddle points. In particular, consider a twice differentiable (non-convex) objective function whose gradient has Lipschitz constant L and whose Hessian is well-behaved. We prove that the probability of initial conditions for gradient descent with step-size up to 2/L converging to a strict saddle point, given one uniformly random initialization, is zero. This extends previous results up to the sharp limit imposed by the convex case. In addition, the arguments hold in the case when a learning rate schedule is given, with either a continuous decaying rate or a piece-wise constant schedule.

artificial intelligence, machine learning, strict saddle, (15 more...)

arXiv.org Machine Learning

1908.01753

Country: North America > United States (0.46)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.96)

Add feedback

Provable Bregman-divergence based Methods for Nonconvex and Non-Lipschitz Problems

Li, Qiuwei, Zhu, Zhihui, Tang, Gongguo, Wakin, Michael B.

arXiv.org Machine LearningApr-21-2019

The (global) Lipschitz smoothness condition is crucial in establishing the convergence theory for most optimization methods. Unfortunately, most machine learning and signal processing problems are not Lipschitz smooth. This motivates us to generalize the concept of Lipschitz smoothness condition to the relative smoothness condition, which is satisfied by any finite-order polynomial objective function. Further, this work develops new Bregman-divergence based algorithms that are guaranteed to converge to a second-order stationary point for any relatively smooth problem. In addition, the proposed optimization methods cover both the proximal alternating minimization and the proximal alternating linearized minimization when we specialize the Bregman divergence to the Euclidian distance. Therefore, this work not only develops guaranteed optimization methods for non-Lipschitz smooth problems but also solves an open problem of showing the second-order convergence guarantees for these alternating minimization methods.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1904.09712

Country: North America > United States (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Stable Opponent Shaping in Differentiable Games

Letcher, Alistair, Foerster, Jakob, Balduzzi, David, Rocktäschel, Tim, Whiteson, Shimon

arXiv.org Artificial IntelligenceNov-20-2018

A growing number of learning methods are actually \emph{games} which optimise multiple, interdependent objectives in parallel -- from GANs and intrinsic curiosity to multi-agent RL. Opponent shaping is a powerful approach to improve learning dynamics in such games, accounting for the fact that the 'environment' includes agents adapting to one another's updates. Learning with Opponent-Learning Awareness (LOLA) is a recent algorithm which exploits this dynamic response and encourages cooperation in settings like the Iterated Prisoner's Dilemma. Although experimentally successful, we show that LOLA can exhibit 'arrogant' behaviour directly at odds with convergence. In fact, remarkably few algorithms have theoretical guarantees applying across all differentiable games. In this paper we present Stable Opponent Shaping (SOS), a new method that interpolates between LOLA and a stable variant named LookAhead. We prove that LookAhead locally converges and avoids strict saddles in \emph{all differentiable games}, the strongest results in the field so far. SOS inherits these desirable guarantees, while also shaping the learning of opponents and consistently either matching or outperforming LOLA experimentally.

artificial intelligence, lola, machine learning, (17 more...)

arXiv.org Artificial Intelligence

1811.08469

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Game Theory (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.89)

Add feedback

Escaping from Saddle Points

@machinelearnbotMar-24-2016, 13:05:31 GMT

Non-convex functions can be much more complicated. In this post we will discuss various types of critical points that you might encounter when you go off the convex path. In particular, we will see in many cases simple heuristics based on gradient descent can lead you to a local minimum in polynomial time. Here \eta is a small step size. This is the gradient descent algorithm.

artificial intelligence, machine learning, saddle point, (17 more...)

@machinelearnbot

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.58)

Add feedback